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DESCRIPTION 

Mapping of semantic tags to phrases for grammar generation 
Field of the invention 

The present invention relates to the field of automated language understanding for 
dialogue applications. 

Background and prior art 

Automatic dialogue systems and telephone based machine enquiry systems are 
nowadays widely spread for providing information, as e.g. train or flight timetables or 
receiving enquiries from a user, as e.g. bank transactions or travel bookings. The crucial 
task of an automatic dialogue system consists of the extraction of necessary information 
for the dialogue system from a user input, which is typically provided by speech. 

The extraction of information from speech can be divided into the two steps of speech 
recognition on the one hand side and mapping of recognized speech to semantic 
meanings on the other hand side. The speech recognition step provides a transformation 
of the speech received from a user in a form that can be machine processed. It is then of 
essential importance, that the recognized speech is interpreted by the automatic 
dialogue system in the correct way. Therefore, an assignment or a mapping of 
recognized speech to a semantic meaning has to be performed by the automatic 
dialogue system. For example for a train timetable dialogue system the enquiry "I need 
a connection from Hamburg to Munich", the two cities "Hamburg" and "Munich" have 
to be properly identified as origin and destination of the train travel. 

Essential fragments of the above sentence "from Hamburg" or "to Munich" have to be 
extracted and to be understood by the automatic dialogue system to the extent, that the 
phrase "from Hamburg" is mapped to the origin semantic tag whereas the phrase "to 
Munich" is mapped to the destination semantic tag. When all semantic tags like origin, 
destination, time, date, or other travel specifications are mapped to phrases of the user 
enquiry, the dialogue system can perform a required action. 
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The assignment of mapping of recognized phrases to semantic tags is typically provided 
by some kind of grammar. A grammar contains rules defining the mapping of semantic 
tags to the phrases. Such rule based grammars have been the most investigated subject 
of research in me field of natural language understanding and are often incorporated in 
actual dialogue systems. An example of an automatic dialogue system as well as a 
general description of automatic dialogue systems is given in the paper "H. Aust, M. 
Oerder, F. Seide, V. Steinbiss; the Philips Automatic Train Timetable Information 
System, Speech Communication 17 (1995) 249-262". 



Since an automatic dialogue system is typically designated to a distinct purpose, as e.g. 
a timetable information or an enquiry processing system, the underlying grammar is 
individually designed for those distinct purposes. Most of the grammars known in the 
prior art are manually written in that sense that the rules constituting the grammar cover 
15 a huge set of phrases and various combinations of phrases that may appear within a 
dialogue. 



In order to perform a m appin g between a phrase and a semantic ,teg,Jhe J phrase.or the, 
combination of phrases has to match at least one of the rules of the manually written 
grammar. The generation of such a hand written grammar is an extreme time 
consuming and resource wasting process, since every possible combination of phrases 
or variations of a dialogue have to be explicitly taken into account by means of 
individual rules. Furthermore a manually created grammar is always subject to 
maintenance, because the underlying set of rules may not cover all types of dialogues 
and tvpss ofphrasea maLtypically occur during .op-rrtfaa, of ibejartomatic dialogue 
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multiplicity of different grammars represents a considerable cost factor which should be 
minimized. 

In order to reduce a rather costly amount of manual efforts for generation, maintenance 
5 and adaptation of grammars, methods for an automatic generation of grammars or 
automatic learning of grammars have been introduced recently. An automatic 
construction of a grammar is typically based on a corpus of weekly annotated training 
sentences. Such a training corpus can for example be derived by logging the dialogue of 
an existing application. However, an automatic learning further requires a set of 
10 annotations indicating which phrases of the training corpus are assigned to which 

known tag. Typically, this annotation has to be performed manually but it is in general 
less time consuming than the generation of an entire grammar. 

The paper "K Macherey, F. J. Och and H. Ney; Natural Language Understanding using 
15 Statistical Machine Translation', presented at the 7 th European Conference on Speech 
Communication and Technology, Aalborg, Denmark, September 2001" which is also 
available from the URL 'Tittp://wasserstoff.informatik.rw1h- 

aachen.de/Colleagues/och/einrospeech2001.ps" describes the automatic learning of a 
grammar. 

20 

In fact the document discloses an approach to natural language understanding, which is 
derived from the field of statistical machine translation. The problem of natural 
language understanding is described as a translation from source sentence to a formal 
language target sentence. This method therefore aims to reduce the employment of 
25 grammars in favour of a learning of dependencies between words and their meaning 
automatically. To this extent the mentioned method deals with a translational problem 
rather than with the automatic generation of a grammar. 

In contrast to that, the US Patent application US 2003/0061024 Al explicitly 
30 concentrates on the learning of a grammar. This method is based on deter minin g 
sequences of terminals or of terminals and wild cards linked to non terminals of a 
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grammar in a training corpus of sentences. After sequences of terminals or terrninals 
and wild cards have been determined they are assigned to a non terminal or no non 
terminal by means of a classification procedure. This classification in turn uses an 
exchange procedure which is based on an exchange algorithm. The exchange algorithm 
5 guarantees an efficient optimization of a target function which takes account of all 
incorrect classifications and which is iteratively optimized in the classification of the 
sequences of terminals or of terminals and wild cards. Thereby the order of the non 
terminals in the training sentences does not have to be annotated manually since the 
target function uses only the information as to which sequences of terminals or of 
10 terminals and wild cards and which non terminals are present in the training sentences. 
Furthermore, the exchange procedure guarantees an efficient (local) optimization of the 
target function since only a few operations ar necessary for calculating the change in 
the target function upon the execution of an exchange. 

15 The present invention aims to provide another method for mapping semantic tags to 
phrases and thereby providing the generation of a grammar for an automatic dialogue 
system. 

Summary of the invention 

20 The invention provides an automatic learning of semantically useful word phrases from 
weekly annotated corpus sentences. Thereby a probabilistic dependency between word 
phrases and semantic concepts or semantic tags is estimated. The probabilistic 
dependency describes the likelihood that a given phrase is mapped or assigned to a 
distinct semantic tag. In this context a phrase is used as a generic term for fragments of 

25 a sen ienca. a sequence of v.mrda or in the minimi esse i sin sle word. 
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annotation can be realized for example by assigning a set of candidate semantic tags to 
a phrase. Alternatively an DEL (inclusion/exclusion list) can be used. An IEL represents 
a list that includes or excludes various semantic tags that can be mapped or must not 
map a phrase. 

According to a preferred embodiment of the invention, for each phrase of the training 
corpus an entire set of mapping probabilities between the phrase and the corresponding 
set of candidate semantic tags is determined. In this way a probability that a given 
phrase is assigned to a semantic tag is calculated for each possible combination 
between the phrase and the entire set of candidate semantic tags which yields in an 
automatic learning or generation of a grammar. 

According to a farther preferred embodiment of the invention, a semantic tag is mapped 
to a phrase of the training corpus in accordance to the highest mapping probability of 
the set of mapping probabilities. This means that the mapping or assigning of a tag to a 
given phrase of the training corpus is determined by the highest probability of the set of 
mapping probabilities for the given phrase. 

The method for mapping semantic tags to phrases makes therefore explicit use of the 
determination of mapping probabilities. Such a mapping probability can for example be 
determined from the given weak annotation between phrases and semantic tags of the 
tr aining corpus. Generally, there exists a plurality of probabilistic means to generate 
such a mapping probability. 

According to a further preferred embodiment of the invention, the statistical procedure, 
hence the calculation of the mapping probabilities, is performed by means of a 
expectation maximization (EM algorithm). EM algorithms are commonly known from 
forward backward training for Hidden Markov Models (HMM). A specific 
implementation of the EM algorithm for the calculation of mapping probabilities is 
given in the mathematical annex. 
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According to a further preferred embodiment of the invention, a grammar can be 
derived from the performed mappings between a candidate semantic tag and a phrase. 
Preferably the calculated and performed mappings are stored by some kind of storing 
means in order to keep the computational efforts on a low level. Finally, the derived 
5 grammar can be applied to new, unknown sentences. 

The overall performance of the method of the invention can be enhanced when the EM 
algorithm is applied iteratively. In this case the result of an iteration of the EM 
algorithm is used as input for the next iteration. For example an estimated probability 
10 that aphrase is mapped to a tag is stored by some kind of storing means and can then be 
reused in a proceeding application of the EM algorithm. In a similar way the initial 
conditions in form of weak annotations between phrases and tags or in form of an IEL 
can be modified according to previously performed mapping procedures according to 
the EM algorithm. 

15 

In order to test the efficiency and reliability of an EM based algorithm for grammar 
learning, the EM based algorithm has been implemented by making use of a so called 

Boston Restaurant Guide corpus. Experiments based on this implementation 

demonstrate that an EM based procedure leads to better results than a procedure based 

20 on an exchange algorithm as illustrated in US Pat No. 2003/0061 024 Al, especially 
when large training corpora are used. Furthermore, it has been demonstrated, that a 
repeated application of the EM based procedure leads to continuous improvements of 
the generated grammar. The tag error rate, which is defined as the ratio between the 
number of falsely mapped tags and the total number of tags, shows a monotone descent 

25 when described. asiiJundtion over tha number of iterations. The; msin-mipravemsnfe of 
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Figure 1 is illustrative of a flow chart for the mapping of phrases and tags by 

means of an EM based algorithm, 
Figure 2 shows a flow chart illustrating a dynamic programming construction of a 

table L which is a subroutine for the EM algorithm, 
5 Figure 3 is illustrative of a flow chart describing the implementation of the EM 

algorithm. 

Detailed Description 

Figure 1 shows a flow chart for mapping of semantic tags to phrase based on the EM 
10 algorithm. In a first step 100 a phrase w is extracted from a training corpus sentence. 
In the following step 102 a step of mapping probabilities p(k 9 w) for each tag k from a 
list of unordered tags k . 

Once a set of mapping probabilities has been calculated for the phrase w , the highest 
15 probability of the set of mapping probabilities p(k, w) is determined in the following 
step 104. In the next step 106 the mapping between the phrase w and a semantic tag k 
is performed The phrase w is mapped to a single tag k according to the highest 
probability p(k, w) of the set of mapping probabilities, which has been determined in 
step 104. In this way the mapping between a semantic tag k and a phrase w is 
20 performed by making use of a probabilistic estimation based on a training corpus. The 
probabilistic estimation determines the likelihood, that a semantic tag k is mapped to a 
phrase w within the training corpus. When the mapping has been performed in step 
106 it is stored by some kind of storing means in step 108 in order to provide the 
performed mapping for a proceeding application of the algorithm. In this way, the 
25 procedure can be performed iteratively leading to a decrease of the tag error rate and 
thus to an enhancement of the reliability and efficiency of the entire grammar learning 
procedure. 
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The calculation of the mapping probability which is performed in step 102 is based on 
the EM algorithm, which is explicitly explained in the mathematical annex by making 
reference to figure 2 and figure 3. 

The calculation of the mapping probability according to the EM algorithm is based on 
two additional probabilities denoted as L(i,k') , and R(i, k') , respectively, representing 
the probabilities for all permutations of an unordered tag sublist k' of length i-1 over 
the left subsentence and the unordered complement tag sublist over the right 
subsentence of a training corpus sentence from position i + 1 . 

Figure 2 is illustrative of a flow chart for calculating the probability L(i,k') . 

In a first step 200, the initial probability for i = 0 is set to unity before in the next step 
202, the index of the tag sublist i is initialized to i = 1 . In the following step 204, each 
sublist of length i is selected from the unordered tag sublist k! . After selecting each 
sublist the calculation procedure continues with step 206, in which the probability 
LQ^) = 0 for a permutation is set to zero. Then, in step 208 each tag k from the 
unordered-sublist is selected^ — 
which the permutation probability is calculated according to: 

L (i, *0 = L(i, atO + L(i -1,k'\ {k}) • p(k \w t ). 

After the calculation of LQ,^) , in step 212, the index i is compared to the number of 

words in the pniase W. if * is less or equal \W\ , 'die prccedure.r.Bturm to step 204 bv 
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Once the permutation probability has been calculated according to the procedure 
described in figure 2, an analog calculation is performed in order to obtain the 
permutation probability R for the complement sublist of the right subsentence. 

Figure 3 finally illustrates the implementation of the EM algorithm for calculating a 
mapping probability p(k, w) by making use of the above described permutation 
probabilities. 

In the first step 300 for all tags k and phrases w the probability p(k \ w) is initialized 
by setting £ = 0 and setting q(k 9 w) = 0 , before in step 302 one of the training corpus 
sentences is selected. Since every sentence of the training corpus is taken into account 
for the grammar learning, the following step 304 has to be applied to all sentences of 
the training corpus. 

After a sentence of the training corpus has been selected in step 302 it is further 
processed in step 304, in which the steps 306, 308, 310, and 312 are successively 
performed In step 306, an unordered tag list k' as well as an ordered phrase list W are 
selected. In the next step 308, the dynamic programming construction of the table L is 
performed as described in figure 2. After that, a similar procedure is performed with the 
reversed table R in step 310. 

The calculated tables L and R as well as the initialized probabilities are further 
processed in step 3 1 2. Step 3 12 can be interpreted as a nested loop with an index i = 1, 
i<>\W\. For each i , step 3 14 is performed initializing another loop for each of the 
unordered sublists k of length i - 1 . For each unordered sublist the step 3 1 6 is 
performed selecting each tag k £ id and performing the following calculation in step 
318: 



q' = L(i-1,*0 ■ p(k | w f ) - R(i + U* x {k}), 
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where q' is further processed in step 320 according to: 

q{k,w t ) = q(k,w.) + q' and q=q+q'. 

When the steps 3 1 8 and 320 have been executed for each tag k e in step 316, when 
step 3 1 6 has been performed for each unordered sublist of length i - 1 in step 314, 
when step 314 has been performed for each index i <, \W\ in step 312, and when finally 
the entire procedure given by step 312 has been performed for each sentence of the 
training corpus, then in step 322 the mapping probability is determined according to : 

p(k,W) = q(k,w)/q\/k,w. 

Once the mapping probability has been determined, it is preferably stored by some kind 
of storing means. For the purpose of grammar learning and for mapping a tag to a given 
phrase all probabilities of all possible combinations of phrases and candidate semantic 
tags are calculated and stored. Finally, the mapping of a semantic tag to a given phrase 
isj)erformed^ccording to the maximum probability of all ^calculated probabilities for 
the given phrase. 

Based on 1he plurality of performed mappings, the grammar is finally deduced and can 
be applied to other and hence unknown sentences that may occur in the framework of 
an automated dialog system. 

Especially wnen toe EM algorithm is repeatedly applied to a training-eorrius of 
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MATHEMATICAL ANNEX 

According to a preferred embodiment of the invention, the mapping probability 
5 p(k, w) , that a given phrase w is mapped to a semantic tag k is calculated by means 
of an expectation maximization (EM) algorithm. The implementation and adaptation of 
a EM algorithm are described in this section. 

Here, an approach which is similar to forward backward training of HMMs is followed. 
10 The general equation for EM based grammar learning is given by: 

where W is a sequence of phrases, K is a tag sequence, w is a phrase, k 
15 is a semantic tag, N K (k, w) is the occurrence that k and w occur together for a given 

W and K 9 and W) gives the probability that a sequence of phrases is mapped 

to a tag sequence K. 

This approach assumes that the number of tags s equals the number of phrases. The 
20 numerator of equation (1): 

K 

adds for each tag sequence K the probability p(K\ W) as many times as the tag A: is 
25 mapped to phrase w in this tag sequence. This may be rewritten as follows: 
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2>CST | W) ■ N K (k,w)= £2>(iT | W) • SQc ( ,k) • S{w t ,w) 

K K i 

= S I P{K\W) 

~p{k t =k\W) 

where S(x, y) is the usual delta function 

5 and p(k t =k\W) is the overall probability that the phrase w at position i in the phras< 

string Wis mapped to tag k. Similarly, for the denominator of Eq.(l) the following 
holds: 



10 



resulting into the estimation formula 
2>(A,.==*|JF) 

— v i:w. aw 

-"^l^w" " (2) 



For the estimation over the whole corpus, numerator and denominator must be 
1 5 separately computed und summed up for each corpus sentence. 
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^ S (»r\{i}):I l r'|=f-l^(«')^y=l ^ 



p(k\w t ) 



( ( ° W 



~~ =£(i-l^) *(i+l^W)\{Jt}) 

5 £(t - 1, k') is the probability for all permutations of the unordered tag sublist k' of 
length i-1 over the left subsentence up to position *-l,and R(i + l,(/c\/c^)\{k}) is the 
probability for all permutations of the unordered complement tag sublist (K\/c')\{k} of 
length s — i over the right subsentence from position i+1 . These values can be 
recursively computed: 



10 



ksk' JCe>r(«c'\{A:}) /=1 

= j;#|w i )-I(i-U'\{A:}). 



(3) 



15 Similarly, 



Storing and re-using the values L(i,xT) and flfojO in Eqs. (3) and (4) reduces 
computational costs. For a given i , there are ' J unordered tag lists and thus 

20 ST 1 ' 1 ^ ' j • i operations to perform to fully compute the table L (same holds for table 
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R ). However, no closed form or good estimation for this has been found, so it is not 
clear whether the computation is not efficient in the sense that it has a polynomial 
computing time. 



5 Implementation 



10 



The implementation of the EM algorithm is a direct consequence from the above 
mentioned expressions. The implementation is further described by Figs 2 and 3 for one 
iteration. There are just some notes about the implementation: 

- For technical reasons, each element of the unordered tag list k gets a unique 
index in the range from 1 to | k | . An unordered sublist k' of length i is 
represented as an i - dimensional vector whose scalar elements are the indexes 
of the elements from k that participate in . This vector is incremented 
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to successively obtain all unordered sublists of length i , 
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where a y is the j th element of the vector representation of K r . The addition or 
removal of a tag to or from K f is reflected in the index of the tag. The index /? 
of the complement unordered list of tags needed for accessing 
R(i,(fc\fe')\{k}) = R(fi) is easily computed by 

5 

> ff = 2 M -l-a-2 a - 1 . 
For fester computation, there is a table whose j th entry contains the value 2 J . 

10 - The dynamic programming computation of the list R is performed by calling 
the subroutine that uses dynamic programming to compute the list L with a list 
of phrases W whose phrase order is reversed, i.e. wj=w s _ i+l . 

- Sentences with an unequal number of tags and phrases are discarded. 

15 

- The initial probabilities p(k,w) are read in from a file and p(w) is computed as 
marginal for p(k \ w) . The file simply lists k,w, and p(k, w) in one ASCII 
line. The estimated probabilities p(k, w) are written down in the same format 
and thus serve as input for the next iteration. 

20 

Figure 2 illustrates a flow chart for iteratively calculating the probability L(i,/c') for all 
permutations of the unordered tag sublist K f of length i over the left subsentence up to 
position i\ 

25 Initially, in step 200 the probability X(0,{}) is set to unity, before the index i is set to 
1=1 in step 202. 

In step 204, a loop starts and each unordered sublist k! of length i is selected. In the 
proceeding step 206, the probability L&k') = 0 for each selected unordered sublist is 
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set to zero before in the next step 208 each tag k which is an element of the unordered 
sublist is selected. In step 210 finally, the probability L(i,re') is calculated according to: 

L(i, k') = L(i, k') + L(i -1,k'\ {k}) p(k\w ( ). 

5 

In step 212 it is checked whether the index i is smaller or equal the number of words in 
the phrase. If i<\W\ in step 212, then i is incremented by one, and the procedure 
returns to step 204. When in contrast i > \W\ , then the procedure stops in step 214. 

1 0 The calculation of the probability for all permutations of the unordered complement tag 
sublist of the right subsentence from position i +1 is performed correspondingly. 

Figure 3 is illustrative of a flow chart diagram for calculating a mapping probability 
p(k, w) on the basis of the EM algorithm. In step 300 for all tags k and phrases w the 
1 5 probability p(k | w) is initialized by setting q = 0 and setting q(k, w) = 0, before in 
step 302 one of the training corpus sentences is selected. Since every sentence of the 

^. t4 ^ kXX& ^xjf/uo-w-Muwwu-iuKceMifcuaui-nji uio-^mmuiir-ieaniuigftae-rouowmg"Step"JU4 

has to be applied to all sentences of the training corpus. 

20 After a sentence of the training corpus has been selected in step 302 it is further 
processed in step 304, in which the steps 305, 308, 310, and 312 are successively 
applied. La step 306, an unordered tag list te as well as an ordered phrase list W are 

selected. In the next stec 308. the dvrrsr 
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length i - 1 . For each unordered sublist the step 3 1 6 is performed selecting each tag 
k <£ k' and performing the following calculation in step 318: 

q' = L(i - 1, k') p(k\w t y R(i + Ute\ *') \ {k}), 

where q' is further processed in step 320 according to: 

q(k,w i ) = q(k,w i ) + q' mdq=q+q'. 

When the steps 318 and 320 have been executed for each tag k g k' in step 316, when 
step 316 has been performed for each unordered sublist of length i-l in step 314, 
when step 3 14 has been performed for each index i^\W\ in step 3 12, and when finally 

the entire procedure given by step 312 has been performed for each sentence of the 
training corpus, then in step 322 the mapping probability is determined according to: 



p(k,w) = q(k,w)/qVk,w. 
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CLAIMS 



1 . A method of calculating a mapping probability that a semantic tag of a set of 
candidate semantic tags is assigned to a phrase, wherein the calculation of the 
mapping probability is performed by means of a statistical procedure based on a 
set of phrases constituting a corpus of sentences, each of the phrases having 
assigned a set of candidate semantic tags. 

2. The method according to claim 1, for each phrase further comprising calculating 
a set of mapping probabilities, providing the probability for each semantic tag of 
the set of candidate semantic tags being assigned to the phrase. 

3 . The method according to claim 2, further comprising determining one semantic 
tag of the set of candidate semantic tags having the highest mapping probability 
of the set of mapping probabilities and mapping the one semantic tag to the 
phrase. 

4. The method according to any one of the claims 1 to 3, wherein the statistical 
procedure comprises an expectation maximization algorithm. 

5. The method according to claim 3 or 4, further comprising storing of performed 
mappings between a candidate semantic tag and a phrase in form of a mapping 
table in order to derive a grammar being applicable to unknown sentences or 
unknown phrases. 
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A computer program product for calculating a mapping probability that a 
semantic tag of a set of candidate semantic tags is assigned to a phrase, wherein 
the calculation of the mapping probability is performed by means of a statistical 
procedure based on a set of phrases constituting a corpus of sentences, each of 
the phrases having assigned a set of candidate semantic tags. 

The computer program product according to claim 6, for each phrase further 
comprising program means for calculating a set of mapping probabilities, 
providing the probability for each semantic tag of the set of candidate semantic 
tags being assigned to the phrase. 

The computer program product according to claim 7, further comprising 
program means for determining one semantic tag of the set of candidate 
semantic tags having the highest mapping probability of the set of mapping 
probabilities and mapping the one semantic tag to the phrase. 

The computer program product according to any one of the claims 6 to 8, 
wherein "the statistical procedure "comprises ah expectation ma»urn2atioh~ 
algorithm. 

The computer program product according to claim 8 or 9, further comprising 
program means for storing of performed mappings between a semantic tag and a 
phrase or a sequence of phrases in form of a mapping table in order to derive a 

ersrrrrnarbsffi? aoolicabl*" 
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A system for mapping a semantic tag to a phrase of a comprising means for 
calculating a mapping probability that a semantic tag of a set of candidate 
semantic tags is assigned to a phrase, wherein the calculation of the mapping 
probability is performed by means of a statistical procedure based on a set of 
phrases constituting a corpus of sentences, each of the phrases having assigned a 
set of candidate semantic tags. 

The system according to claim 1 1, for each phrase further comprising 
calculating a set of mapping probabilities, providing the probability for each 
semantic tag of the set of candidate semantic tags being assigned to the phrase. 

The system according to claim 12, further comprising determining one semantic 
tag of the set of candidate semantic tags having the highest mapping probability 
of the set of mapping probabilities and mapping the one semantic tag to the 
phrase. 

The system according to any one of the claims 1 1 to 13, wherein the statistical 
procedure comprises an expectation maximization algorithm. 

The system according to claim 13 or 14, further comprising means for storing of 
performed mappings between a semantic tag and a phrase or a sequence of 
phrases in form of a mapping table in order to derive a grammar being 
applicable to unknown sentences or unknown phrases or unknown sequences of 
phrases. 
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ABSTRACT 

Mapping of semantic tags to phrases for grammar generation 

The present invention relates to a method, a system and a computer program product for 
mapping of semantic tags to phrases within a training corpus of weakly annotated 

5 sentences, thereby generating a grammar which can be applied to unknown sentences 
for the purpose of language understanding. The method is based on a probabilistic 
estimation that a given phrase is mapped to a semantic tag of a set of candidate 
semantic tags. The mapping and the generation of the grammar is performed according 
to a maximum mapping probability of a set of mapping probabilities of the given phrase 

10 and the set of candidate semantic tags. In particular, the determination of the mapping 
probability makes use of an expectation maximization algorithm. 

(figure 1) 
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Fig. 3 
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